Can punctuation help learning?
نویسنده
چکیده
The quality of learnt natural language grammars can be enhanced by exploiting the linguistic devices that comprise a corpus. This paper considers one such device, namely punctuation. After brieey considering the linguistics of punctuation, a model capturing some of these properties is presented. Following this, a series of experiments learning uniication-based natural language grammars, using the Spoken English Corpus as data, demonstrate that even a simple model of punctuation increases the plausibil-ity of learnt grammars over grammars learnt without the use of punctuation.
منابع مشابه
Exploring The Role Of Punctuation In Parsing Natural Text
Few, if any, current NLP systems make any significant use of punctuation. Intuitively, a treatment of lrunctuation seems necessary to the analysis and production of text. Whilst this has been suggested in the fiekls of discourse strnetnre, it is still nnclear whether punctuation can help in the syntactic field. This investigation atteml)ts to answer this question by parsing some corpus-based ma...
متن کاملBaldwin, Timothy and Manuel Paul Anil Kumar Joseph (2009) Restoring Punctuation and Casing in English Text, In Proceedings of the 22nd Australian Joint Conference on Artificial Intelligence (AI09), Melbourne, Australia, pp. 547-556
This paper explores the use of machine learning techniques to restore punctuation and case in English text, as part of which it investigates the co-dependence of case information and punctuation. We achieve an overall F-score of .619 for the task using a variety of lexical and contextual features, and iterative retagging.
متن کاملDeep Learning for Punctuation Restoration in Medical Reports
In clinical dictation, speakers try to be as concise as possible to save time, often resulting in utterances without explicit punctuation commands. Since the end product of a dictated report, e.g. an out-patient letter, does require correct orthography, including exact punctuation, the latter need to be restored, preferably by automated means. This paper describes a method for punctuation resto...
متن کاملParsing and Subcategorization Data
In this paper, we compare the performance of a state-of-the-art statistical parser (Bikel, 2004) in parsing written and spoken language and in generating subcategorization cues from written and spoken language. Although Bikel’s parser achieves a higher accuracy for parsing written language, it achieves a higher accuracy when extracting subcategorization cues from spoken language. Our experiment...
متن کاملGet control of your commas
is arguably the most misunderstood of punctuation tools. ask someone about comma rules and even those who begin with confidence are likely to trail off apologetically. This is because comma use is not fully explained by rules; it depends in part on taste. But, as David Crystal insists in his history of punctuation, variation in comma use is neither infinite nor totally idiosyncratic [1]. It tur...
متن کامل